PrUE: Distilling Knowledge from Sparse Teacher Networks
نویسندگان
چکیده
Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer from cumbersome (teacher) network into lightweight (student) network. However, guidance teacher does not always improve the generalization students, especially when gap between student and is large. Previous works argued that it due high certainty teacher, resulting in harder labels were difficult fit. soften labels, we present pruning method termed Prediction Uncertainty Enlargement (PrUE) simplify teacher. Specifically, our aims decrease teacher’s about data, thereby generating soft predictions for students. We empirically investigate effectiveness with experiments CIFAR-10/100, Tiny-ImageNet, ImageNet. Results indicate trained sparse teachers achieve better performance. Besides, allows researchers distill deeper students further. Our code made public at: https://github.com/wangshaopu/prue .
منابع مشابه
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
Speech recognition systems that combine multiple types of acoustic models have been shown to outperform single-model systems. However, such systems can be complex to implement and too resource-intensive to use in production. This paper describes how to use knowledge distillation to combine acoustic models in a way that has the best of many worlds: It improves recognition accuracy significantly,...
متن کاملDistilling Knowledge from Deep Networks with Applications to Healthcare Domain
Exponential growth in Electronic Healthcare Records (EHR) has resulted in new opportunities and urgent needs for discovery of meaningful data-driven representations and patterns of diseases in Computational Phenotyping research. Deep Learning models have shown superior performance for robust prediction in computational phenotyping tasks, but suffer from the issue of model interpretability which...
متن کاملDistilling Task Knowledge from How-To Communities
Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the co...
متن کاملDistilling Model Knowledge
Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like to replace such cumbersome models with simpler models that perform equally well. In this thesis, we study knowledge distillation, the idea of extracting the...
متن کاملFace Model Compression by Distilling Knowledge from Neurons
The recent advanced face recognition systems were built on large Deep Neural Networks (DNNs) or their ensembles, which have millions of parameters. However, the expensive computation of DNNs make their deployment difficult on mobile and embedded devices. This work addresses model compression for face recognition, where the learned knowledge of a large teacher network or its ensemble is utilized...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-26409-2_7